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This paper gives an overview of the present 
status and future plans of a research project 
aimed at communicating in natural language with an 
intelligent automaton. The automaton in question 
is a computer-controlled mobile robot capable of 
autonomously acquiring information about its en- 
vironment and performing tasks normally requiring 
human supervision. By natural language communi- 
cation is meant the ability of a human to suc- 
cessfully engage the robot in a dialog using 
simple English declarative, interrogative, and 
imperative sentences. Communication is accom- 
plished by means of a natural language inter- 
pretive question-answering system (ENGROB) 
consisting of six distinct components: a syntax 
analyser, a semantic interpreter, a model of the 
robot's environment, a deductive, automatic theorem- 
proving system, an English output generator, and 
a repertoire of basic robot capabilities for sensing 
and manipulating the environment. An example is 
given that illustrates the type of processing done 
by each component, and the nature of component 
interactions. 


Descriptive Terms: Natural language, English, 
Systems, robots, intelligent 
automata. 


X, Introduction 


The advent of computer-controlled robots 
capable of autonomously sensing a real-world 
laboratory environment, constructing a dynamic 
model of such an environment, and manipulating 
various objects in that environment has provided 
a unique opportunity for research in computational 
linguistics. The question of how one might apply 
current linguistic theory in the design of a con- 
versational, natural language robot communication 
system is certainly an interesting problem in its 
own right. It is the author's contention, however, 
that some aspects of linguistic theory itself 
could be significantly influenced by research in 
this area. We will examine the argument for this 
position in the conclusion. 


There are at least three projects throughout 
the country attempting to design integrated arti- 
ficial intelligence systems that include a 
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computer-controlled automaton of the type described 
above. Intelligent hand-eye machines are being 
investigated in two separate programs, one under 
Professors M. L. Minsky and S. Papert at MIT, and 
the other under Prof. J. McCarthy at Stanford 
University. At Stanford Research Institute we are 
endeavoring to build a mobile automaton capable of 
exploring a real-world laboratory environment. A 
more general discussion of the goals of the SRI + 
robot project may be found in a paper by Nilsson;' 
while details of robot problem-solving capabilities 
may be found in Green.? 


The present paper is based on work in progress 
on a svstem called ENGROB, a natural-language, 
interpretive, question-answering svstem used to 
communicate with the SRI robot in simple English 
sentences. Because ENGROB is not yet fully imple- 
mented, some of what follows should be considered 
to be speculation. However, simple examples based 
on running programs will be used to illustrate the 


nature of the problems encountered in natural- 
language communication with the SRI robot. The 
appendix gives a representative list of English 


sentences that can be processed by ENGROB, together 
with their translations. The list is perhaps the 
simplest way for the reader to obtain an intuitive 
feel for ENGROB's current level of performance. 


The basic paradigm that has guided the develop- 
ment of ENGROB is (1) translate English statements, 
questions, and commands into a formal language 
based on the first-order predicate calculus; (2) 
perform anv necessarv deductive Inferences based on 
the current set of operational axioms and the current 
state of the robot's model of the environment; and 
(3) generate as appropriate an English output sen- 
tence and/or a sequence of primative functions 
within the set of basic robot capabilities for 
sensing and manipulating the environment. The 
Initial translation to the predicate calculus is 
accomplished by means of syntactic and semantic 
analyses based on a large collection of productions 
or pattern-operation rules, while deductions are 
carried out by means of a resolution-based automatic 
theorem prover. English output sentences are pro- 
duced by translating answer expressions in the 
predicate calculus into their English equivalents, 
again by means of a set of productions. For 


* References are listed at the end of this paper. 

** ENGROB depends upon the work of many individuals 

in our group. B, Raphael, C.Green, R.Yates,J,Munson, 
and N. Nilsson have contributed to the theorem-proving 
component; L.Chaitin and CFennema have developed the 
FORTRAN component; R.Duda and P.Hart contributed to 
the vision component; and A.Robinson aided in the 
implementation. 
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comparison with standard terminology used by lin- 
guists, the predicate calculus plays essentially 
the same role as a natural-language "deep structure" 
(cf. Chomsky*), and it is proper to regard the 
predicate calculus in ENGROB as a sort of deep 
structure (of. Bohnert and Backer“). 


If, during the course of translating the 
source statement the semantic analyzer uncovers an 
unclear portion of the text or an unresolvable 
ambiguity, the system assigns to the user a series 
of questions on the unclear portions. The charac- 
ter of these questions depends in part on the 
context of the conversation. The user's replies 
to these questions may be regarded as paraphrases 
of the unclear portions. The system then re- 
analyzes the text. If necessary, the system again 
assigns questions to the user, and in this manner 
establishes a dialog between the user and the robot. 
By means of this dialog the user continually sim- 
plifies the formulation of his task specification 
until it is completely understood by the system. 
An example will illustrate how this paradigm works 
in practice. 


Organization of Robot Software 


Figure 1 shows the organization of the robot 
software. The left-hand side of the figure (LISP) 
essentially corresponds to the ENGROB system. The 
right-hand side (FORTRAN) essentially corresponds 
to the primitive functions and reflexive actions 
necessary to support the robot's basic sensory- 
motor Interaction with the real world. These 
functions are programmed for the most part in 
FORTRAN and machine language, while the higher- 
level routines in ENGROB are programmed for the 
most part in LISP. Interaction between the FORTRAN 
and LISP components is facilitated by a specially 
designed monitor called the VALET. The subcom- 
ponents of ENGROB Indicated In Fig. 1 and the flow 
of control between them bears strong resemblance to 
the organization originally suggested by Bobrow for 
his SENSE natural-language question-answering system 


More precisely, ENGROB is composed of six major 
components which we shall consider in turn: a syntax 
analyzer, a semantic Interpreter, an axiom model, 
an inferential component, an output-sentence 
generator, and an output-action generator. 


Ill. Syntax Analyzer 


The syntax analyzer is based on a transfor- 
mational grammar for a subset of English imperative, 
declarative, and Interrogative sentences. The 
vocabulary Is unrestricted Insofar as adjectives 
and nouns are concerned and in this sense ENGROB's 
analyzer Is similar to a transformational parser 
proposed by Thorn.“ The grammar consists of two 
subcomponents: a transformational component 
serves the purpose of decomposing complex senten- 
ces Into their simpler kernel sentences so that 
parsing can be accomplished by the base component 
in a more efficient manner. The base component is 
derived from a simple phrase structure grammar 
written in Backus-Naur Form. 


The use of transformations in the syntax 
analyzer is currently restricted to string trans- 
formations that map terminal symbols into other 
terminals. The most conspicuous use of trans- 
formations in the current grammar is to recognize 
Interrogative sentence forms either through subject- 
predicate inversions or interrogative pronouns, and 
to map them into their corresponding declarative- 
sentence forms. These transformed declarative 
sentences are then passed to the base component 
for complete analysis. In this manner, by adding 
a dozen transformations to the transformational 
component, we eliminate the need for practically 
doubling the size of the declarative base analyzer 
merely to handle interrogative sentences. Another 
simple but important use of transformations is in 
mapping plural noun and verb forms into their cor- 
responding singular form to facilitate unique 
identification in the deep structure. 


The base component of the grammar was taken 
essentially without change from the GRANIS system’? 
a predecessor of ENGROB developed by the author 
for application to graphical question-answering 
systems. Historically, this base component was 
Implemented as a set of productions in Formula 
Algol. With small effort these productions were 
then transliterated into LISP (with their control 
programs) in order to maintain compatibility with 
the remainder of the system. In previous work this 
base component was expanded by first adding new 
rules to the BNF grammar, applying the Earley 
Algorithm? to the BNF grammar, and then post- 
editing the resulting productions to obtain an 
efficient one-pass, syntax-directed recognizer for 
the BNF grammar. In more recent work with ENGROB, 
however, it has been found to be more convenient to 
work directly with the productions themselves, 
abandoning the original BNF grammar. Thus, under 
the current strategy the productions are treated as 
a separate programming language for grammar con- 
struction, and new productions are added directly 
to the recognizer as needed. 


The form of the productions is as follows: 
Ll: GV O // Y * L2; 


where LI and L2 are labels, cr and O are strings, 
indicates a replacement operation, V Is a 
sequence of semantic productions, the asterisk 
indicates a "read" operation taking the next word 
in the input string and placing it at the top of 
the syntactic stack, and the semicolon Is a punc- 
tuation mark delimiting the scope of the production. 
LI, >, 0, v, and the asterisk are optional charac- 
ters, while both diagonal bars, or, L2 and the 
semicolon are mandatory for each production. Flow 
of control for the productions is defined as follows: 
If, in the course of analysis, control reaches the 


cluster of productions labeled LI and the right-hand 
portion of the contents of the syntactic stack is an 
Instance of the pattern string or, then: 


(1) Replace that portion of the stack that 
was matched by a, with 0 (which will In 
general depend on the portion of the stack 
matched, since free-class variables be- 


come bound if the match is successful). 
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(2) Execute the sequence y if present. 


(3) If an asterisk is Indicated, read a new 
word into the syntactJ: stack from the 
input string. 


(4) Go to the cluster of productions labeled 
L2. 


Otherwise, if the stack fails to match the pattern 
string or, control is passed to the next production 
in the sequence. Possible pattern elements for 
the pattern string a include terminal constants, 
class variables defined in terms of terminal con- 
stants or boolean combinations of other classes, 
the pattern $1, which can match a single arbitrary 
constituent, or the pattern $, which may match an 
arbitrary number of arbitrary constituents much as 
in the COMIT language. So that particular values 
of the stack may be referenced in the replacement 
portion, O, the result of any successful match 
may cause optional extraction variables to be 
bound to the value of a match with a class variable. 
More explanation together with examples of this 
process may be found in Ref. 7. 


Transformational productions have the same 
form as base component productions except for the 
fact that the scanning for a match is from left 
to right across the entire sentence rather than 
from right to left across the syntactic stack. Any 
pattern-element sequence can be quoted, indicating 
that pattern matching is to be accomplished at the 
character level in a particular word rather than 
at the lexical level, and in this manner testing 
for plurals and standard suffixes or prefixes can 
be achieved. 


One of the difficulties uncovered by our 
research on the syntactic component was a purely 
pragmatic one. We were disconcerted to find that 
as we added more and more transformations to the 
grammar, the time for processing kernel sentences 
(to which transformations are inapplicable) in- 


creased in proportion to the complexity of the 
grammar. This was a clearly unacceptible state 
of affairs, since in an ideal implementation the 


processing time for kernel sentences should remain 
essentially constant, regardless of the number of 
transformations. This led us to the notion of 
distributing the transformations throughout the 
base component, thereby blurring the distinction 
between the two subcomponents in our implementation, 
leaving us with a grammar that although technically 
not a transformational grammar, still has trans- 
formational power. Preliminary evidence shows 
that this approach yields a marked improvement in 
parsing efficiency, but we do not yet perceive any 
theoretical implications in this strategy. 


IV,Semantic Interpreter 


Translation of a well-formed English source 
statement into an equivalent well-formed formula 
in the first-order predicate calculus is accom- 
plished by means of a set of semantic productions 
interleaved with the syntactic productions. The 
semantic productions have an identical form and 


flow control with the exception that the * oper- 
ation is never used and the productions operate on 
a separate semantic stack. The method of inte- 
grating the syntactic and semantic analysis within 
a common production framework has been called 
Syntax-Directed Interpretation, and examples of 
this process can also be found in Ref. 7. 


The appendix shows thirty sample sentences 
together with their translation into the predicate 
calculus. Declarative sentences, such as SI - S10, 
are entered directly into the question-answering 
system as axioms; interrogative sentences, such 
as QI - Q10 are submitted as assertions to be 
proved by the inferential component. Simple 
requests, such as Cl - C4, are translated directly 
into FORTRAN commands (cf. Table 1) and passed to 
the command interpreter. Complex imperatives, 
such as C5 - ClO, are treated as assertions about 
the possibility of discovering a sequence of 
primitive actions that accomplish a task subject 
to specified constraints. Therefore, they are 
treated in much the same fashion as questions. 


Most of the predicates are self-explanatory, 
but a detailed look at one of the more difficult 
sentences, S10, will serve as a guide for under- 
standing the remaining formulas. The initial 
determination by the syntactic component is the 
applicability of the active/passive voice trans- 
formation which then maps the given sentence into 
its active form: "John pushed the tall box.“ Next, 
the base component recognizes the past tense of 
the verb "push" and sets the Time predicate accor- 
dingly. The adjective "tall" and the noun "box" 
each map into the In predicate. The final well- 
formed formula can be interpreted roughly as follows: 
There exists a state s, an object x, and places y 
and z such that s is equal to a state obtained 
from some initial state S; by having John push x 
from y to z, where x is characterized by being both 
in the class of tall objects and in the class of 
boxes, and, furthermore, s happened in the past. 
Note that the letter R (for Robot) in some of the 
other sentences corresponds to the antecedent of 
the pronoun "you". 


The predicate calculus has thus far proved to 
be a sufficiently powerful internal representation 
for capturing the meaning of our simple English 
sentences. As our grammar expands to handle in- 
creasingly complex sentences, it will probably 
continue to serve as our "deep structure" repre- 
sentation with a few minor modifications. Of 
course an additional advantage of using predicate 
calculus as an internal representation for the 
meaning of English sentences is that we then have 
a common language for representing both the lin- 
guistic and nonllnguistic Information about the 
world vital to intelligent communication with the 
robot. Moreover, we then capitalize on effort 
expended by logicians in establishing the logical 
properties of the predicate calculus, and can pre- 
cisely describe the class of deductions possible 
within our framework. The logical limitations on 
competing representations, such as directed graph 
networks or description lists, are not always obvious. 
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One of the theoretical issues uncovered by our 
work on the semantic component was how to find a 
canonical set of predicates for describing actions. 
What is being sought is a reasonably small but 
complete list of predicates that could exhaustively 
describe all the essential features of an action. 
A tentative list based on work by N. Rescher'' is 
presented below: 


Predicate Question 
(1) Agent (x, y) Who did it? 


(2) Act (x, y) 
(3) Object (x, y) 


What did he do? 
To what or whom did 


he do it? 
(4) Setting: In what context did 
he do it? 
Itime (x, y) 
Ftime (x, y) When did he do it? 
Iloc (x, y) 
Floc (x. y) Where did he do it? 
Circ (x, y) Under what circum 


stances did he do 
it? 


(5) Modality: How did-he do it? 


Means (x, y) By what instrument 
or method did he 
do it? 

In what manner did 


he do it? 
Why did he do it? 


What caused him to 
do it? 

With what intent did 
he do it? 

In what state of 
mind did he do it? 


Manner (x,y) 


(6) Rational: 


Cause (x,y) 
Aim (x, y) 


Mentality (x, y) 


Our basic premise is that the adequacy of any 
deep-structure representation should be measured 
by the class of questions that can be easily ans- 
wered by the data when represented in that form; 
hence the requirement for closely tying the 
predicates to questions that can be reasonably 
asked about an action. We are not yet certain 
that the above list is generally adequate, but 
for purposes or our robot work it seems to be 
sufficient for the time being. 


V. The Axiom Model 


Three kinds of information are contained in 
the axiom model: geometric relationships represen- 
ted in the grid model, rules describing con- 
straints on the robot's capabilities for sensing 
and manipulating the world, and descriptive infor- 
mation extracted from declarative sentences 
obtained during conversation with humans. The 
grid model describes the position, size, and 
orientation of various objects and obstacles by 
partitioning a plan view of the robot's environ- 
ment and imposing a cartesian coordinate system. 
Axioms about the position and orientation of 
various objects including the robot are entered 
into the axiom model automatically as they are 


updated in the grid model. Axioms that describe 
the initial and boundary conditions for various 
sequences of primitive FORTRAN commands are per- 
manently entered here, and are used during problem- 
solving and question-answering operations. The 
axiom model grows dynamically during the course of 
conversation as humans type declarative sentences, 
since these statements are translated into the 
predicate calculus and entered directly into the 
store of axioms that can be used for future infer- 
ences. 

VI.The 


Inferential Component 


Deductions are implemented bv means of a highiv 
efficient, automatic, deductive theorem-proving 
system, QA3, developed by Green and Raphael'' and 
based on Robinson's resolution procedure. QA3 
discovers proofs by refutation. To prove a theorem 
by refutation, one first hypothesizes the negation 
of the theorem and then attempts to obtain a con- 
tradiction, if one exists, by attempting and then 
failing to construct a model that satisfies both 
the axioms and the negation of the theorem. If 
such a model cannot be found, then it has achieved 
a constructive proof of the affirmative statement 
of the theorem, and can answer not merely YES or 
NO as to whether the original hypothesis was a 
theorem, but also for what values of the existen- 
tially quantified variables the theorem will be 
valid. It is this important feature of QA3, as a 
theorem prover, that permits its application to 
question answering and problem solving. 


QA3's efficiency is greatly enhanced by the 
addition of a number of completeness-preserving 
heuristics—i.e., heuristics that limit the scope 
of search for a proof without violating the logical 
completeness of the basic resolution procedure. 
The discovery of new heuristics of this type 
appears to be a fruitful area for future research. 


VII. The Output-Sentence Generator 


Output sentences are produced bv means of a 
small generative grammar based on the same pro- 
ductions described earlier. Thus, we see a very 
wide application of this sort of rewrite rule 
appearing in all of the linguistic components of 
this system. The form of the reply sentence is 
frequently determined by applying a simple trans- 
formation to the input question or command. For 


example, "Will you do x?" may give rise to "Yes, 
| will." or "No, I will not do x." Occasionally, 
however, the output sentence will have a nontrivial 
syntax—i.e., one that is not immediately obtainable 


from a simple transformation of the input. For 


example, "Move ten feet forward," may give rise to 
the reply "I can move only five feet because there 
is a wall in front of me." Here we see that semantic 


information contained in the axiom model determines, 


in part, the form of the reply. 


VIII. The Output-Action Generator 


The result of most imperative sentences (as 
well as certain interrogative sentences that re- 
quire for their answer not only information 
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contained In the model, but also Information that 
must be obtained by inspection of the real world) 
will be a sequence of two-letter FORTRAN commands 
with appropriate arguments which are then passed 
to the FORTRAN subsystem for execution. Table 1 
shows a partial list of these commands to give the 
reader a better understanding of the robot's 
repertoire of basic actions. Upon execution, each 
command returns information to ENGROB about its 
success and other sensory data acquired, if any, 
for Incorporation into the model. In this manner 
ENGROB can monitor the progress of the robot In 
executing a sequence of primitive commands, and 
reassign a new sequence or subsequence as necessary 
due to unanticipated obstacles. 


x AN EXAMPLE 


Now that we have examined the various compon- 
ents of ENGROB individually, let us see how they 
actually interact by means of a concrete example. 
Consider the following scenario which we expect 
to accomplish during the next few months: 


Scene: Two people are seated at teletypes in the 
robot room which is filled with various 
cubes and wedges. 

Time 2:45 p.m. 

Person.: Bring me a small cube at 3:00 p.m. 

Robot: There are two small cubes. 

Person.: Bring me the smaller cube. 

Robot: OK 

Person.: Will you push a small cube? 

Robot: Yes, | will push a small cube. 

Person : When will you push the cube? 

Robot: | will push the cube at 3:00 p.m. 
Time: 3:01 p.m. 

Robot: I have brought vou a small cube. 

Person : Thank vou. 


The first step in processing the sentence 
'Bring me a small cube at 3:00 p.m.' is to trans- 
late it into the predicate calculus. The syntactic 
component establishes that it is a well-formed 
imperative sentence, and the semantic component 
actuallv carries out the translation, giving 


C: (de, x) [At(x, Person, , 2) A Int x, Small) A 


In(x,Cube) A Time(8,1500)) . 
The 'C' asserts that the logical tvpe of the follow- 
ing wff is 'command/' The wff Itself asserts that 
there exists a state s and an object x such that 
the object is at Person, in State s, the object is 
small, the object Is a cube, and the state occurs 
at time 1500. 


The next step is to pass the wff to QAS as an 
assertion to be proved. Let us assume that among 
our data base of facts about the environment we have: 


In addition, we have an axiom of the form 


(Vo, x, y. 2) (At (x, y. 4) 7 At(x,2,Push(R,x,y,%,8)}} 


meaning that if an object x is at location y in 

state 8, then it will be located at z in the state 

that results from the robot, R, pushing x from y 

to 2. Furthermore, we know that Time (x,y) is 

an evaluable predicate. Then, under the condition 

that Time(s,1500) evaluates to true, QA3 will reply: 

yes, if x = OB, and 6 = Push(R,OB,,P,,Person,,8,) 
or x OB, and 6 = Pusb(R,OB,,P,,Person,,8,). 


This reply in turn guides the generation of 
the output sentence, “There are two small cubes” by 
the output sentence generator, The conversation 
continues, and assuming Size(OB.) > Size(OB,) ob- 
tained by perceptual rather thaħ linguistic infor- 
mation, the output action generator will submit 
the command "PU Ryo Vi · x2. „ when clock time 
is 1500, where (24.71) are the coordinates of 
Py, T is the radius of OB4, and (29:79) are the 
coordinates of Person,. 


The conversation with Person, also yields 
translations into the predicate calculus of the 
form: 


a: (, t, x, y. ) LB, Push (R, x, y, 2, 8.) A 
In(x, small) A In(x,cube) A Time(s,t} A 
Future(t}} 

with replies from QA3 of the form: 


as Push(R,OB,,P,, Person,, 8.) 


yee, if 


t = 1560 
x = OB, 


yu P, 


ga Person, $ 


These replies are then used to generate the answer 
sentences, 


The reply sentence "I have brought you a small 
cube" is generated automatically by the successful 
execution of the PU FORTRAN comand, 

X Implications for Li stica 

The complexity of the processing necessary to 
accomplish the superficially simple-minded task 
described in the previous section is enormous, In 
fact, the syntactic and semantic analysis necessary 
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to translate these expressions into the predicate 
calculus and the transformations necessary to 
generate the output replies comprise less than 

half of the total processing required. The greatest 
portion of the processing time is consumed by the 
theorem prover, QA3, in establishing the feasibility 
of composing a sequence of primitive functions 
which, when executed, will accomplish the desired 
goal. Preliminary evidence indicates that as 
humans desire to have more substantive conver- 
sations with the robot, inferential requirements 
will grow, with the result that QA3 will consume a 
still greater proportion of the total processing 
time. 


During a demonstration, many people, their 
enthusiasm whetted by some preliminary success 
with a comparatively trivial command like "Turn 
right", will go on to pose a problem for the robot 
that is dramatically beyond its present capability. 
There seems to be a machine-like regularity in 
most humans' lack of appreciation for the tacit 
assumptions regarding geometric space-time re- 
lationships and the volume of unarticulated 
knowledge about the possibility of, and constraints 
on, certain kinds of behavior that are implicit 
in the simple English sentences they can type to 
the robot. This phonomenon is vaguely reminis- 
cent of the master chess player who is incapable 
of articulating by what principles he is able to 
play master-level chess to the designer of a chess- 
playing program. 


As | see it, the implication this observation 
has for linguistics is as follows: Insofar as 
linguists seek to explain how people "understand" 
language, they will have to shift some of their 
attention away from the grammatical aspects of 
language--generation and parsing sentences--and 
focus more attention on how people bring their 
immense data base of knowledge about the world 
to bear in a relevant manner on the comprehension 
of some string of lexical items in the context 
of some particular universe of discourse. And, 
furthermore, they will have to focus on the methods 
by which people bring their knowledge to bear even 
when inferences are required to several levels of 
indirectness. 


Because of the enormous complexity of this 
total process, and because humans appear to do it 
in what seems to be negligible time and effort, 
there is a strong temptation to describe the 
process as some kind of Gestalt phenomenon, not 
describable in terms of a collection of analytic 
procedures. Based on the preliminary results of 
experiments with our robot, however, | suggest 
that such an interpretation is erroneous. The 
fact that humans are largely unaware of all the 
linguistic analysis and data analysis they perform, 
and that they can perform it quickly, does not 
constitute evidence that they don't do analysis. 
Our robot can perform simple tasks today that 
occasionally provide surprising evidence for its 
understanding of language, even though that under- 
standing is limited, the processing time is 
lengthy, and the motions of the vehicle are awkward. 


The technology of robot hardware is bound to 
improve, as is the technology of computer hard- 
ware. In my judgement a conclusive demonstration 
of robot understanding of natural language is 
still a long way in the future, but ENGROB does 
serve as a demonstration that computer under- 
standing of language that refers to the real 
world is possible. Furthermore, it shows that 
one could build a system around the principles 
discussed above that would permit robots and men 
to communicate in restricted English in real- 
world environments. 


XI Conclusion 


We have been discussing problems in the de- 
sign and organization of a computer program that 
can permit robots and people to communicate in 
natural language. Progress on these problems 
thus far has been limited to a few simple scen- 
arios that systematically exercise all of the 
basic capabilities of the hardware. Work is 
underway, however, on each of the six components 
of ENGROB. The transformational grammar is being 
extended to include nonterminal transformations; 
the predicate calculus is being expanded to 
handle a larger family of quantifiers; the class 
of updatable predicates will be augmented in the 
axiom model; QA3's heuristics are being refined; 
the output sentence generator will subsequently 
draw on QAS for producing semantically relevant 
replies; and the action generator will have closer 
feedback with reality. 


Before creating a false sense of optimism 
that dramatic improvements are just over the 
horizon, | might add that even if appropriate 
progress were made in each of the components, 
there still remain, among other problems, enormous 
systemic difficulties in integrating the com- 
ponents into a functioning whole properly embedded 
within a complex time-sharing system operating in 
a hardware environment of partial uncertainty. 
Frequently, one spends as much of one's time on 
these systemic problems, getting the robot to 
operate on a day-to-day basis, as on the major 
theoretical issues. Other difficulties appear on the 
horizon. Can we really ever get to investigate 
the nontrivial problems we would like to, within 
the memory-response-time limitations of our hard- 
ware? Will the predicate calculus prove inadequate 
as a natural-language deep structure, even when 
augmented by probabilistic, multi-values, or 
modal logic? Can the semantic component partici- 
pate in the parsing process so as to resolve 
lexical and syntactic ambiguity with respect to 
the universe of discourse? Will our aspirations 
falter because the vision routines for years to 
come will never be able to recognize anything more 
complex than the difference between a cube and a 
triangular prism? Speculation of this kind is 
sobering, but our immediate goals are well defined, 
and only future research will tell whether our 
underlying optimism Is justified. 


Finally, let us return to the earlier conjec- 
ture made in the introduction that certain portions 
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of linguistic theory itself will be influenced by 
natural language communication with robots. The 
argument goes as follows: First, because robots 
provide the computer with a "window on the real 
world," they offer a host of new opportunities 
for empirically studying the relationship between 
language and reality* But it is this relationship 
that falls by definition under semantics, precisely 
the portion of linguistics that has received com- 
paratively little theoretical attention thus 

far’. 13,14,16 in particular, the work reported in 
this paper having to do with space-time relation- 
ships, such as illustrated in the sample scenario, 
has forced us to think more carefully about how 
to encode the meaning of statements about real- 
world activities. Clearly, statements of this 
sort, which reference space-time relationships, 
aboundin all human conversation as well as in 

the most elementary children's books, and an 
adequate model of these relations must be an 
essential ingredient in any theory of semantics. 
Robots will serve in a sense as a laboratory for 
testing the adequacy of our semantic represen- 
tations and our logics, and ultimately may reveal 
new approaches to these basic questions. 


In addition, robots have a number of purely 
philosophical Implications. For the first time we 
have an opportunity to empirically investigate such 
important philosophical questions as "free will" 
or "self-awareness". We will be required to define 
in a precise and operational manner such concepts 
as possibility and necessity as well as other con- 
cepts such as can, cause, knows, believes, under- 
stands, and so on. These in turn must be based 
on epistemologlically and metaphysically adequate 
representations of reality together with logical 
formalisms suitable for inference making and prob- 
lem solving. Here again robots will serve as a 
basis for empirical investigations that heretofore 
could be conducted only from the armchair. 


APPENDIX 


Sample English Sentences With Translations Into Predicate Calculus 


Declarative Sentences 


81, All men are mortal, 

83. If John is 4 man then he is mortal. 

83. John and Fred are tall thin men. 

84. Some tall men are not boys. 

85, No man is a woman, 

86. John haa two hands, 

87. Every hund hae five fingers, 

88. Anything that is green is a tall thin box. 
89, Any box smaller than e green cube that is on 


the right is a red and white prisa, 


- 
It is not implied here that robots are necessarily 


the only way that this might be done, but rather 
one of the more convenient methods of achieving 
this goal. 


VX) [In (x, man) 7 In(x,mortal)) 
In (John, nan) In (John, mortal) 


In (John, tall) A In (John, thin) A In (John, nan) A 
In (Fred, tall) A In(Fred, thin) A Int Fred, man) 


(Ix)({In(x, tall) A In (x, nan) A ~ Int x, boy)] 

~(¥x){In (x, nan) = In(x, woman) } 

Hasp(John, hand, 2) 

cya) [In, (x, hand) » Hasp(x, finger, 3)) 

(VX) IL In(x, green) = In (x, tall) A In(x,thin) A In (x, box). 


(¥x)[In(x, box) A (Ay) [SSI 1er (x, y) A In (y, grean) A 
In (y, cube) A (¥z){Right(y,2)}} = In(x, red) A 
In( x, white) A In(x, pri a)] 


—Sgu- 
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The tall box was pushed by John, 


Interrogative Sentences 


Qi. 
Q2, 


Is there a nan? 

Ta Jane a man? 

Who is Jane? 

How manv fingers does John have7 
Which box ia the cube near the door? 


Are there any boxes on your left? 
Where are you? 


Will you push the box? 


When will you push the box? 


Did you push a box yesterday? 


Imperative Sentences 


Cl. 
C2. 
C3. 
C4. 
C5. 


C6. 


CT. 


C8. 


cs, 


€10, 


Stop. 

Turn around, 

Move ten feet. 

Turn right 45 degrees, 
Go to the big red prism, 


(%e,x,y,2z){Eq(s, Push(John,x,y,z,81)} A In(x,tall) A 
In{x, box} A Time(s, PAST) 


(Sx){ In(x, man) } 

In (Jane, Man) 

(Ir) (In (Jane, x)] 

(Zx) Inasp( John, finger, x) 

(Ax) IL Intx, box) In (x, cube) A (Ay) [Near (x, y) A 


In(y,door)}} 

(ix)(In(x,box) A Left(x,R)) 

WH * 

(Zo, x, y, 20 [EJ(s, Push(R,x,y,%,81) A In (x, box) A 
Time(s, FUTURE) } 


(3a, t,x, y, z){Eq(s, Push(R,x,y,z,851) A In(x,box) A 
Time (8,t) A future (t)) 


(1s,x,v,z)(Eq(8, Push{R, x, y, 2, 81) A In(x,box) A 
Time(s, YESTERDAY) } 


ST * 

TU 180,, * 

MO 10., * 

ru-48., * 

(Ja, x) At (R, x, s) A In(x,big) A In(x,red) A 


Inl x, prism) } 


(As, x, y, z) [ Pushed (x, s) A In(x, black) A Int x, box) A 
on (x, y) A Inly, top) A Of (7, 2) A In(s,Platform)) 


(Z, x, y) [At (x, y., s) A In (x, vedge) A (Vz) ILert(x, z)] A 
In(y, Platform) | 


(1i8,x,v)iOn(R,x,8) A In(x,Ramp) A Rext (x, y,) A 
In(y, platform) } 


Collect all the cubes into the center of the (vx) (de, y, 2) [At (x, y, 3) A In(x,cube) A In(r,center) A 
room. Of (y. 2) A In (z, room) 


Explore John's office, (Za, x) [Explored (x, a) A In(x,office) A Of (x, John)] 


Push the black box on top of the platform, 


Move the wedge that im on the left to the 
platforn, 


Roll up the ramp next to the platform. 


NOTE: As of this writing, sentences C6, C7, and C10 have not yet been executed as robot tasks, 


although they can be correctly translated within ENGROB into the predicate calculus. We 
expect to have the robot actually carry out these commands during the next few months, 


BASIC FORTRAN COMMANDS 


Command Explanation 

87 Stop. 

L Clear the model, 

N Read the nodel. 

WO N Move forward N feet. 

MA Display a map of the room 

TUN Turn counterclockwise N degrees. 

R R Set the current x- coordinate of the robot to N. 
YR R Set the current y- coordinate of the robot to N, 
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Table 1 Continued 
BASIC FORTRAN COMMANDS 


Command Explanation 

XG N Set the x goal coordinate to N. 

YG N Set the y goal coordinate to N, 

AN N Set the current angle of the robot to N degrees. 

Go Go to the goal (by touch sensors only). 

TE Plan a journey to the goal using vision, and execute it, 

PI X,Y Take a picture at location (X, 7). 

IR Iris, 

FO Focus. 

TI N Tilt the camera N degrees. 

PAN Pan the camera N degrees. 

SC N Scan the room in N steps with the range finder, 

OV N Turn overrides on or off as a function of N, 

PU XT) Re 2, Tz aks the object located at (A: 110) of Radius R to the goal location 
2. 12). 

WE Print the current values of XR,TR, and AN. 
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